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AN  EFFICIENT.  RLS.  DATA-DRIVEN  ECHO  CANCELLER 
FOR  FAST  INITIALIZATION  OF  FULL-DUPLEX  DATA  TRANSMISSION 


John  At.  Cioffi 
IBM  Research  Laboratory 
San  Jose.  CA  95193 

ABSTRACT 

Compote  (tonally  efficient  Recursive-  Least-Squares  (RLS) 
procedures  are  presented  specificalljr  for  the  adaptive  adjustment 
of  the  Data- Driven  Echo  Cancellers  (DDECs)  that  are  nsed  in 
voiceband  full-duplex  data  transmission.  The  methods  are  shown 
to  yield  very  short  learning  times  for  the  DDEC  while  they  also 
simultaneously  reduce  computational  requirements  to  below  those 
required  for  other  least-square  procedures,  such  as  those  recently 
proposed  by  Salz  (1983).  The  new  methods  can  be  used  with  any 
training  sequence  over  any  number  of  iterations,  unlike  any  of  the 
previous  fast-converging  methods.  The  methods  are  based  upon  the 
Fast  Transversal  Filter  (FTF)  RLS  adaptive  filtering  algorithms 
that  were  independently  introduced  by  the  authors  of  this  paper 
however,  several  special  features  of  the  DDEC  are  introduced  and 
exploited  to  further  reduce  computation  to  the  levels  that  would  be 
required  for  slower-converging  stochastic-gradient  solutions. 
Several  trade-offs  between  computation,  memory,  learning-time, 
and  performance  are  also  illuminated  for  the  new  initialization. 

1.  INTRODUCTION 

Echo  cancellers  were  suggested  for  use  in  2-wire 
full-duplex  data  transmission  by  Roll  and  Weinstein  [lj  in  1973. 
Additionally,  much  other  work  concerning  the  data  echo 
canceller  has  appeared  in  (2-12,19].  Or  particular  concern  in 
this  paper  is  Mueller's  Data-Driven  Echo  Canceller  (DDEC)  [6], 
The  use  of  stochastic-gradient  LMS  (Least-Mean-Square) 
algorithms  in  the  DDEC  has  led  to  unacceptably  long  training 
periods  for  the  full-duplex  modem  [2,11.12].  In  this  paper,  we 
'  ...acifically  investigate  the  use  of  the  Fast-Transversal-Filters 
(FTF)  Recursive-Least-Squares  (RLS)  adaptive  algorithms 
[13-14]  in  the  DDEC  to  substantially  reduce  the  necessary 
training  period. 

Because  the  transmitted  data  sequence  is  usually 
"whitened"  through  scrambling  prior  to  entering  the  transmitter 
and  DDFC,  it  was  orginally  believed  that  the  use  of  RLS 
adaptive  algorithms  would  have  led  to  no  improvement  in  the 
convergence  time  of  the  DDEC  in  comparison  to 
stochastic-gradient  techniques.  However,  Farrow  [IS],  Honig 
[It]  and  Salz  [12]  verified  a  significant  convergence 
improvement  (about  a  factor  of  5,  see  [11])  of  the  RLS  (or 
closely  related)  methods  when  the  double-taking  data  signal  was 
intentionally  silenced  during  the  initial  training  phase  of  the  data 
transmission.  However,  there  arc  several  drawbacks  of  the 
"Saiz-Farrow"  (SF)  method  in  [11],  [12],  and  [13].  Most  of 
these  are  the  result  of  the  SF  method's  absolute  necessity  for  the 
training  sequence  to  be  "pseudorandom"  with  very  special 
autocorrelation  properties  and  with  a  period  equal  to  the 
number  of  coefficients  (order)  of  the  echo  canceller,  which 
limits  both  the  permissible  orders  (to.  say.  7,  15.  31.  63.  127, 
255,  511 . 2n-l)  and  the  performance  of  the  RLS  DDEC. 

This  paper  introduces  FTF  solutions  that  require  less 
computation  than  the  SF  method,  permit  training  of  the  echo 
canceller  with  any  known  training  sequence  of  any  length  (long 
enough  to  converge),  and  which  converge  as  fast  or  faster  than 

Ptis  work  was  supported  in  part  by  the  U  S.  Army  Research 
Office,  under  Contract  DAAG29-79-C-02  IS,  and  by  the 
Air  Force  Office  of  Scientific  Research.  Air  Force  Systems 
Command,  under  Contract  AF49-620-79.C-00S3. 
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the  SF  method.  The  freedom  of  choice  in  training  sequence  can 
also  result  in  a  factor  2  or  more  improvement  (reduction)  in 
learning  time  to  get  to  the  same  echo  canceller  performance 
level  as  the  SF  echo  canceller.  We  specifically  investigate  many 
interesting  trade-offs  between  computational  requirements  and 
the  performance  of  the  echo  canceller. 

The  new  method's  much-less-restrictive  or  arbitrary 
choice  of  training  sequence  permits  use  of  sequences  that  are 
"white"  (autocorrelation  matrix  is  a  diagonal),  such  as  those  of 
Milewski  [17].  The  use  of  such  a  sequence  can  lead  to  as  much 
as  a  3dB  advantage  over  the  pseudorandom  sequence.  Also,  the 
prewindowed  FTF  solutions  do  not  require  "priming"  the  echo 
channel  with  N  inputs,  before  computation  can  begin,  as  is 
necessary  in  the  SF  method,  and  are  numerically  stable  over  the 
initialization  time  period  (using  the  soft-constraint  initialization 
of  [13.14]). 

A  possible  disavanlage.  however,  of  the  new  method  is  its 
requirement  of  more  read-only  memory  than  the  SF  method,  if 
one  wishes  to  keep  computation  to  an  absolute  minimum.  This 
extra  memory  is  used  to  pre-store  certain  quantities  of  the  RLS 
algorithms  that  are  solely  a  function  of  the  known  training 
sequence.  Since  practical  experience  dictates  that  the  cost  of 
read-only  memory,  in  comparison  to  the  cost  of  the  other 
signal-processing  functions  that  appear  in  high-speed  modems,  is 
low;  this  possible  disadvantage  is  minimal. 

Section  2  reviews  and  analyzes  the  RLS  DDEC.  Section  3 
introduces  and  discusses  the  new  recursive  initialization 
procedures.  Finally,  Section  4  is  a  brief  conclusion.  A  longer 
version  of  this  paper  appears  in  [16). 

2.  RLS  AND  THE  DDEC 

This  section  briefly  reviews  the  DDEC  and  the  application 
of  RLS  methods  to  it. 

2.1  Definitions  and  Terminology 

The  near-end  transmitted  data  signal  is  defined  as 

u,(t)  -  Re  {X  x(kTI)p(t-kT|)ei“c*| .  2-1 

where  the  inphase  and  quadrature  data  symbols  are  the  real  and 
imaginary  parts  of  x(kTs),  respectively.  The  carrier  frequency 
is  uc/2ir,  the  baseband  pulse  shaping  is  p(t),  and  1/TS  is  the 
symbol  rate.  Also,  "Re"  denotes  the  real  part  of  a  complex 
number.  U|(t)  is  the  real  part  of  analytic  signal,  U|(t), 

U,(t)  -  2  x(kT,)p(t-kTs)ei“*‘ .  2-2 

k 

The  impulse  response  of  the  combined  hybrid  and  channel  path 
is  h(t).  The  hybrid  output.  d(t).  is  the  sum  of  the  echo,  and  the 
uncorrelated  double-talking  data  signal  and  channel  noise  u->(t), 

d(t)  a  h(t)  •  u | ( t)  +  u,(t)  ,  2-3 

where  *  denotes  continuous-time  convolution. 

h(t)  is  also  written  as 

h(t)  -  Rcih^Dc^1}  ,  2-4 

where  hgg(t)  is  the  baseband-equivalent  [  18 ]  echo  path  for  h(t). 
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Than  d(t)  becomes  (see  [  1 S  ] ) 

d(t)  -  Re  x(  kT,)  g( t-kTs)e*"c1 }  +  Uj(t)  .  2-5 

where  g(t)  is 

«(0  -  i  P(t)  -  hra(l)  .  2-6 

A 

The  echo  estimate  is  d(l).  and  the  error  signal  (double- talker 
estimate)  is 

€<t)-d(t)-d(t).  2-7 

The  Minimum  Mean-Square-Error  (MMSE)  of  eft)  is 

MMSE  -  min  e[c(i)2]  -  c\  .  2-8 

d 

A 

where  d(t)  becomes 

d(t)  -  Re  1 2  x(kTJ)*(t-kTpei"*‘ }  ,  2-9 

assuming  a  linear-time-invariant  estimator  followed  bjr  a 
modulator.  E[-]  denotes  expectation. 

U  The  Data-Driren  Echo  Canceller  (ODEC) 

The  DDEC  appears  in  Figure  I.  The  adaptive  transversal 
filter  acts  at  a  tap-spacing,  T,//,  that  is  sufficiently  short  to 
satisfy  the  Sampling  Theorem  for  the  entire  passband 
transmitted  signal  Uj(t),  where  1  is  integer.  The  continuous 
Uj(t)  is  rewritten 

U,(t)  -  2  x(kTJ)ei"*kT«g(t-kT,)ej"«<,-kT*>  2-10a 

k 


R  Q 

where  "Ira"  denotes  imaginary  part,  k  and  WN  k  are  the 
real  and  imaginary  parts  of  the  adaptive  transversal  filter 
(complex  /xNf  row  vector)  that  estimates  g(l),  N  is  the  order 
of  (or  number  of  spanned  symbol  periods  in)  the  DDEC.  and 
X^(k)  is  the  column  vector  (N lx  I)  corresponding  to  the  last  N 
DDEC  inputs  at  the  sampling  rate  l/Tr 

_  ,  T 

X*(k)  »  JxCkTj)  0„0  I  x([k-N  +  1]T,)  0  ...  0] 

2-14 

where  a  superscript  of  T  denotes  transpose. 

=  13  Sebcancellers 

Detailed  analyses  of  the  use  of  subcancellers  appear  in 
[2,3,5].  The  essential  structural  simplification  arisev  because  N 
(not  N /)  taps  in  the  transversal  filters  contribute  to  d(T)  at  any 
sampling  instant  (see  zeros  in  2-14).  The  structure  is  equivalent 
to  /  sub-echo  cancellers  or  "subcancellers"  that  independently 
ash  to  estimate  the  I  phases  (per  symbol  period)  of  the  desired 
echo-contaminated  output.  We  add  the  new  observation  that 
the  same  inputs  appear  in  each  subcanceller,  and  the  majority  of 
computation  in  the  FTF  (or  any  fast-RLS)  algorithms,  which 
depends  only  on  these  inputs,  need  only  be  performed  once  for 
the  group  of  subcancellers,  even  when  the  training  sequence  is 
unknown.  Since  (24  in  practical  voiceband  modems  for 
full-duplex  data  communications,  this  leads  to  large 
computational  and  storage  savings. 

2.4  The  Application  of  RLS  to  the  DDEC 

The  RLS  DDEC  chooses  W^  k  -  W“.|k  +  jWjJJk  to 
minimize  the  (for  the  i1"  subcanceller) 

«N<k>  -  2  (d(mTt  +  IT*)  +  W$kx5(mT,)  +  2-15 

m-0 


+  wS|kX$(mT,))2  , 

where  i=0 . /-I,  X*(mT5)  and  X®(mTs) 

imaginary  parts  of 


are  the  real  and 


-  ^JCkT^gU-kT,)  2-10b 
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XN(mT,)  £ 


x(mT,) 

x([m— N  +  1JT,) 


2-16 


where  x(kT,)  -  x(kT,)ei“*kT* 

2-11 

and  g(t)  -  g(t)ei“'t , 

2-12 

as  discussed  in  [2],  In  conventional  echo-cancellation  schemes 
[2],  the  designer  carefully  chooses  uc  and  T,  so  that  the  rotation 
of  the  data  symbols  in  Equation  2-11  is  trivial  (typically  270*). 
Thus,  we  now  drop  the  tilde  on  x(kTs)-»x(kT,)  in  the  ensuing 
results.  The  DDEC  thet^synthesizes  g(t)  at  rate  Tf/1 
Since  d(t)  is  real,  d(t)  should  also  be  real,  or 


d(kT.)  -  W*  kRe  [XN(k)}  -  W°  kIm{XN(k)}  2-13 


respectively,  and  T,  =  T ,//.  Minimization  of  Equation  2-13 
directly  requires  the  two-channel  (real)  FTF  algorithms  of 
[13-14],  However,  a  single-channel  FTF  algorithm  can  be  used 
(during  training)  by  staggering  the  inphase  and  quadrature  data 
sequences  (setting  either  or  both  to  zero  at  appropriate  time 
instants).  The  single-channel  is  less  costly  to  implement  than 
the  two-channel.  More  about  specific  implementations! 
comparisons  appear  in  Section  3. 

The  solution  to  2-1S,  when  the  DDEC  staggers  the  suiting 
sequence,  is 

k  (  T 

WN‘.k  ’  (  2^1,  +  iT')x”(mT,)')  .  2-17a 

k  -I 

(  £  x5(mT,).\^(mTs)T)  and 

Vm«0  ' 

k 

WN‘.k  “  <  2  dl‘T.  +  Cm  +  k  +  N)T,]X°[(m  +  k  +  N)T5])t 

m-0 

2- 17b 

(  2  X?l(m  +  k  ♦  N)Tj]X^[(m  +  k  +  NjT.f) 

'm-0  t 

where  k>N,  and  k  is  the  number  of  learning  iterations  for  the 
subcanceller.  (The  reason  for  k  +  N  in  Equation  2- 1 7b  is 


discussed  in  Section  3,  and  has  to  do  with  the  aforementioned 
staggering.)  No  generality  or  performance  is  lost  if  one  by 
choosing  XN(mTs)  such  that 

xg(mT,  +  (k  +  N)T.)  -  X*(mT,)  Osm<k-l .  2-18 

Thus,  for  all  21  subcancellers  (/  inphase  and  /  quadrature),  one 
need  only  invert  the  same  matrix 

R^k)^  JX^mT^X^raT,)'.  2"19 

BaO 

The  FTF  methods  invert  this  matrix  only  implicity  to  obtain  an 
equivalent  set  (much  less  storage)  of  parameters  (CN  ^  filters  or 
"Kalman  Cain,"  see  [16]).  All  of  the  compulation  in  the 
equivalent  of  the  inversions  can  be  off  line  since  the  training 
sequence  X^(kT,)  is  known  beforehand.  No  useful  off-line 
computation  can  be  performed  in  the  SF  method,  which  also  is 
restricted  to  the  use  of  pseudorandom  training  sequences. 

2.5  Performance  Analysis  of  RLS  DOEC 

We  assume  that  NTj  exceeds  the  nonzero  time  extent  of 
g(y.  Under  this  common  assumption,  the  RLS  estimates  of 
WN ,  and  WH .  are  unbiased  after  k>N  iterations  (see 
[13.14]).  that  is' 

E[W*  J  «  Re{[s(*T^) — g((k  —  N  +  1)T,  +  iT'))]}  2-2Ca 
ElWN.kl  *  ImfGj]  .  2-20b 

irrespective  of  the  double-taker’s  presence.  G;  is  the  impulse 
response  for  the  i,h  echo  subchannel.  This  unbiased  estimator 
property  is  not  exhibited  by  stochastic-gradient  solutions  until  a 
much  latej  time.  The  (RLS)  covariance  matrix  ([13.14])  for 
either  [WN  k  or  W’  k]  is  (for  k>N  ,  and  white  channel  noise) 

cov(WX.J  “  c°v[Wn',J  -  Rn00«2  .  2*21 


We  now  turn  to  comparing  the  various  RLS  solutions. 
Paralleling  Salz  [12].  we  can  use  the  trace  of  cov  [W^.  J  as  an 
indicator  of  equality.  Salz  [12]  shows  that  if  a  iength-N 
pseudorandom  sequence  is  used,  then 

trace  {  cov  [W*.‘s_,]}  -  ,  2-23 

while  it  is  trival  to  show  for  a  pure  while  training  sequence 
(after  N  iterations) 

trace  {  cov  [w^N_,]|  «  .  2-24 

Thus,  the  pseudorandom  training  sequence  is  about  3dB  worse 
over  1^  iterations  than  a  white  training  sequence,  under  the 
cov[W^.  k]  criteria.  Equivalently,  it  is  also  easy  to  show  [12] 
(for  pseudorandom)  that  over  2N  iterations 

trace  {  cov  [W^]}  -  .  2-25 

or  it  takes  about  twice  as  long  to  reach  the  same  performance 
level  with  pseudorandom  training  in  comparison  to  white 
training. 

The  above  performance  comparisons  also  hold  for  the 
multichannel  case,  see  [16],  responses  are  simultaneously 
computed. 

We  also  show  in  Section  3.  after  further  defining  the 
windowing  methods,  that  in  terms  of  double-talker  estimate 
quality,  the  pseudorandom  starting  sequence  is  the  worst 
possible  choice,  so  that  the  FTF  methods  offer  substantial 
improvements  in  comparison  to  the  SF  methods. 

3.  RLS  DDEC  ALGORITHM  COMPARISON 

This  section  lists  and  compares  the  various  initialization 
procedures  for  the  RLS  DDEC.  (See  Tables  1-3). 

3.1  New  FTF  Solutions  for  DDEC  Initialization 

An  important  component  in  assessing  performance  and 
learning  time  of  the  RLS  DDEC  is  the  data  window  for  the 


where  o2  =  e[u2(1)2]  , 


sum-of-squarcs-errors  criterion  (equation  2-15).  In  the  DDEC. 
2-22  essentially  two  windowing  cases  are  of  interest:  the 


and  RN(k)  is  given  in  2-19.  Thus,  ^he  RLS  solution  is  near 
optimuip  after  N  iterations  only  if  o2  is  small.  One  achieves 
small  Oj  by  intentionally  silencing  the  Rouble-talking  data  signal 
during  training  [12],  This  leaves  «2  equal  to  the  residual 
channel-noise  power  level,  which  is  typically  much  smaller  than 
the  power  levels  of  the  other  signals  in  the  problem.  However, 
the  stochastic-gradient  methods  wiU  still  be  far  from  optimum 
because  of  the  biased  mean  of  W^-  k  or  Wij  k  after  only  N 
iterations.  They  take  about  5-10  times  longer  [11,16], 

Figure  2  simulates  a  situation  typical  of  4800  bps 
full-duplex  data  transmission.  The  order  is  22,  while  the 
number  of  subcancellers  I  is  4.  The  performance  improvement 
of  the  RLS  methods  is  illustrated  by  the  staggered  prewindowed 
FTF  solution,  which  permits  training  more  rapidly  than  the 
stochastic-gradient  solution. 
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prewindowed  case  and  the  Growing-Memory  Covariance 
(GMC-"unwindowed”)  case  (see  [13.14]).  The  prewindowed 
FTF  solutions  assume  that  all  data  before  the  very  first  iteration 
is  zero.  The  more  general  GMC  case  allows  this  data  to  take 
arbitrary  values.  The  GMC  method  is  only  necessary  for  the 
DDEC  if  one  desires  the  autocorrelation  matrix  to  assume  some 
exact,  prespecified  form  on  the  Nlh  iteration  of  the 
initialization,  such  as  is  in  the  SF  method  [12],  This  fixing  of 
the  autocorrelation  matrix  mandates  the  “priming"  of  the  echo 
channel  with  approximately  N  nonzero  data  values  prior  to  the 
first  iteration  of  the  algorithm,  which  adds  an  additional  N  (2N 
in  multichannel  or  QAM  case)  delay  (in  symbol  periods)  to  the 
learning  time.  In  the  prewindowed  solution,  there  is  absolutely 
no  need  for  this  priming,  thus  leading  to  a  reduction  in  learning 
lime.  Both  experimentally  and  analytically,  the  elimination  of 
priming  is  not  a  significant  drawback  for  the  prewindowed 
algorithm. 

Another  important  component,  in  terms  of  learning  time 
and  computation,  of  the  DDEC  intialization  is  the  choice  of  a 
single-channel  or  a  multichannel  solution.  The  staggered 
single-channel  solution  requires  one-hall  the  computation  ol  the 
multichannel  solution,  but  can  lead  to  an  extra  N  units  ol  delay 
in  the  prewindowed  case.  Specifically,  the  proposed  staggered 
single-channel  solution  first  transmits  and  trains  upon  the 

inphase  echo  channel  (W“>T  i  .  0 . 1  -  1)  ,  while 

simultaneously  zeroing  (suppressing)  the  quadrature  training 
sequence.  Then  N  symbol  periods  of  suppressing  both  inphasc 
and  quadrature  sequences  follow  to  clear  the  echo  channel,  fhe 
third  and  final  step  is  to  now  transmit  only  (he  quadrature 
training  sequence  (usually  the  same  sequence,  see  Equation 
2-l?b),  while  suppressing  inphase  signals.  Since  both  inphase 


and  quadrature  estimation  is  separate  in  time,  a  single  channel 
algorithm  is  used  twice,  once  for  inphase  training,  and  once  for 
quadrature  training. 

The  prewindowed  single-channel  (staggered)  FTF 
algorithm  is  then  (k>N-l) 

Prewindowed  (Single-Channel)  FTF  (i  —  0, _ ,  1-1) 

i).  0<m<k  (zero  quadrature  training  sequence) 

e*  i(ra)-d(inT,+  iT't)+  3-la 

w&n.  -  WnVi  ■+  3-lb 


ii).  k<m<k+N-l  (zero  both  inphase  and  quadrature 
training  sequences) 

tit).  k+  N- 1  <m  < 2k+  N- 1  (zero  inphase  training  sequence) 


«8*(n0  -  d(mT,  +  IT',)  +  W^X^mT,) 

3-lc 

WN.m  =  WN'.n,l  +  ‘Ni<m-1>CN.n.  • 

3-ld 

while  the  GMC  (unwindowed)  version  is 

Covariance  (Singlc-rCliaiinel)  FTF  (i  =  0 . H) 

i).  0<m<N-2  (nonzero  inphase  priming,  zero  quadrature) 

«K(m>  =  d(mT,  +  iTt)  +  wg^X&raT.) 

3-2a 

WN.m  =  +  *n‘<«")Cx.b 

3-2b 

its).  k+N<m<k+2N—  1  (zero  inphase,  prime  quadrature) 
iv).  k+2N<m<2k  +  2N  (while  zeroing  inphase) 

«N<m)  =  d(">T,  +  iT.)  +  W^X^mT,) 

3-2c 

WNjn  =  W$m-l  +  «»(*»)CN4. 

3-2d 
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conditions  of  figure  2. 

In  Figure  2,  we  chose  the  starting  sequence  arbitrarily 
(actually  pseudorandom  sequence  of  length  >65.000»22).  The 
true  echo-channel  impulse  response  v.  s  known  for  the 
simulation,  and  we  computed  and  plotted  the  quantity  of 
Equations  2-23  and  2-24  (norm  tap  error  vector).  There,  one 
sees  that  the  DDEC  converges  in  about  k=N(=22)  iterations  at 
Ts  (or  88  iterations  at  Ts  as  is  shown  in  Figure  2),  and  the 
choice  of  training  sequence  is  not  critical.  We  have  used  k=  100 
iterations  at  rate  T,  (and  100  iterations  of  channel  clearing 
between  inphase  and  quadrature)  to  illustrate  another  advantage 
of  our  approach  that  the  FTF  solutions  can  be  propagated  for 
any  number  of  iterations  to  further  "fine-tune"  the  solution. 
However,  the  minimum  of  2N  is  used  for  the  total  prewindowed 
learning  time  and  total  computation  figures  in  Table  I.  The 
sly>rt  learning  period  in  all  the  methods  of  Table  1  is  caused  by 
a 2  being  very  low  (double-talker  is  silenced  for  training). 

Furthermore,  one  uses  the  formula  for  excess  error  from 
Section  2.2  of  [13]  to  obtain 

excess  MSE  =  0^(1  -  yN(k))  3-5 


The  filter  CN  m  is  computed  from  the  known  training  sequence 
beforehand  atid  stored  for  0<m<k,  (see  [16]).  In  general  in  the 
prewindowed  initialization,  at  least  one-half  of  the  total 
coefficients  (of  CN  m)  are  always  zero,  leading  to  a  reduction  in 
both  computation  and  storage  in  comparison  to  the  covariance 
case,  in  which  no  such  simplification  generally  arises.  The 
covariance  algorithm  can  use  a  training  sequence  that  is  exactly 
white  over  N  iterations. 

The  multichannel  algorithms  determine  the  inphase  and 
quadrature  responses  simultaneously.  The  multichannel 
prewindowed  algorithm  is  (k£2N-l) 

Prewindowed  (Multichannel)  FTF  (i=0, ....  I-l) 

0<m<k 

«N0i(m)  -  d(mT,  +  iT,)  +  W^  ,x“°(mT5)  3-3a 


where 


w«°. 


* « 


NOi<"0CN.n 


3-3b 


while  the  covariance  case  is 

Covariance  (Multichannel)  FTF  (i=0 . 1-1) 

i) .  0<m<2N-l  (prime  inphase  and  quadrature) 

ii) .  2N  Sm  <k+2N 

<"0,(m)  -  d(mT,  +  It’)  +  W^.,x”Q(inT,)  3-4a 


WN°m-l  +  «;0,(n.)CVl 


yN(k)  k  1  _  X^(k)RjJ(k.)X^-(k)  ,  3-6 


The  worst  (maximum  of  3-5)  that  the  echo  canceller  can  do  at 
any  time  (kslN—  I)  is  • 

2 


excess  MSE  m  o, 


3-7 


In  this  case,  the  worst  possible  RLS  MSE  after  echo  cancellation 
is  thus 


MSE  -  2 a\  . 


3-8 


3-4b 


Table  I  compares  the  algorithms  in  Equations  3-1  through  3-4. 

In  Table  1,  the  single-channel  (staggered)  prewindowed 
FTF  (Equations  3-la,  b,  c,  d)  has  the  lowest  computational 
requirements.  The  operation  of  this  particular  method  was 
verified  in  Figure  2.  Table  2  is  a  specific  comparison  for  the 


This  worst  possible  performance  of  prewindowed  RLS  (which  is 
nevertheless  a  dramatic  improvement  over  stochastic-gradient 
methods)  is  achieved  by  the  pseudorandom-trained  SF  method 
or  the  exactly  white-trained  CMC  FTF  method  when  T=N. 
Thus,  any  other  training  sequence^  for  the  FTF  performs  at  least 
as 

well  under  the  excess  MSE  measure. 

3.2  The  SF  Method 

[16]  lists  the  SF  method  in  terms  of  the  quantities  defined 
in  this  paper.  One  should  note  immediately  that  in  only  the  SF 
methods  is  the  number  of  iterations  frozen  beforehand.  Table  1 
lists  the  SF  methods  as  covariance  methods,  since  they  require 
priming  of  the  channel  with  the  pseudorandom  sequence  before 
the  first  iteration  to  ensure  the  desired  structure  of  the 
underlying  autocorrelation  matrix. 

3.3  Storace  Requirements  (Initialisation) 


The  training  sequence  is  not  completely  arbitrary  in  that  the 
autocorrelation  matrix  must  be  nnnsingular,  precluding 
ridiculous  choices  such  as  all  zeros  or  DC. 


The  random- access -memory  (RAM)  requirements  of  all 
the  various  algorithms  above  are  about  the  same,  2N1+N  RAM 
storage  locations.  However,  the  proposed  FTF  methods  of  this 
chapter  also  require  a  significant  amount  of  read  only  memory 
(ROM)  if  one  desires  to  store  the  quantity  Cjq  m  over  the 
initialization  interval  rather  than  compute  it  on  line.  Just  how 
much  storage  depends  upon  the  window  and  also  upon  the 
number  of  channels  (single-  or  multichannel).  The  storage 
requirements  appear  in  Table  3,  in  general  and  under  the 
conditions  of  Figure  3  (N*22,  !w4).  There  one  determines  that 
the  ROM  requirements  are  not  substantial  by  modem  modem 
standards,  especially  when  one  considers  that  many  kilobytes  of 
software  code  are  usually  now  found  in  commerical 
microprocessor-controlled  modem  products. 

4.  SUMMARY 

Computationally  efficient  Recursive-Least-Squares  (RLS) 
procedures  have  been  presented  specifically  for  the  adaptive 
adjustment  of  the  Data-Driven  Echo  Cancellers  (DDECs)  that 
are  used  in  high  speed  full-duplex  data  transmission  over 
two-wire  telephone  lines.  The  methods  have  been  shown  to 
yield  very  short  learning  times  for  the  DDEC  while  they  also  are 
shown  to  reduce  computational  requirements  simultaneously  to 
levels  below  those  that  are  required  by  the  most  efficient 
existing  RLS  (SF)  method  [I2|.  During  the  initialization  period, 
the  new  numerically  stable  methods  significantly  outperform 
slower-learning  stochastic-gradient  (LMS)  solutions  while  also 
requiring  no  more  computational  operations  than  these  same 
LMS  solutions. 

The  new  methods  can  be  used  with  any  training  sequence 
over  any  number  of  iterations.  The  new  methods  are 
applications  of  the  Fast  Transversal  (FTF)  RLS  adaptive 
filtering  algorithms  of  (l3-t4l.  However,  we  additionally 
exploit  several  special  features  of  the  DDEC  to  dramatically 
reduce  computation  below  the  levels  that  would  have  been 
required  for  a  straightforward  use  of  these  FTF  algorithms. 
Several  tradeoffs  between  computation,  memory,  learning-lime 
and  performance  have  been  illustrated.  The  results  of  this  paper 
can  now  be  used  to  design  cost-effective  high-performance 
DDECs  for  full-duplex  data  communications  with  acceptable 
"start-up". 
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